| itle: “Lab 04 - Non-linear regression on dependency trees” |
| uthor: “Francisco Javier Jurado, Roger Pujol Torramorell” |
| ate: “October 29, 2019” |
| utput: pdf_document |
## Loading required package: knitr
## Loading required package: rstudioapi
## Loading required package: kableExtra
TODO: COMMENT THE TABLE Before going any further it is a good practice to check what the data looks like so let’s plot the mean depencency length \(d\) vs the number of vertices \(n\): We would like to check for any possible power-law dependencies and we can do so by plotting data taking logs on both axes:
Note that rather than applying the log to the data itself we have set the plot axes to logarithmic, distributing the ticks in a log fashion. We can now observe that the plots suggest a power-law despite the large amount of dispersion.
A way to deal with this dispersion and get a clearer intuition on the underlying trend is to average the mean length for a given number of vertices: Although there is stil a significant amount of dispersion for larger values of \(n\), we have a way clearer view of the distribution shape. By plotting the same averaged points on log-log axes:
The data points form an almost straight line in the log-log plot (again with dispersion when \(n\) gets large) so we have reasonable evidence to believe they follow a power-law distribution.
We now want to compare how far the real scaling of \(d\) is from the one existing at a random linear arrangement. For that purpose we can compare the points to the averaged ones and the expected mean length, given by \(E[\langle d \rangle] = (n+1)/3\). Plotting that in a regular and double logarithmic scale:
## [[1]]
## [[1]][[1]]
## [1] NA
##
## [[1]][[2]]
## b
## 0.3291662
##
## [[1]][[3]]
## a b
## 0.7381135 0.3500622
##
## [[1]][[4]]
## a c
## 1.642763871 0.009674106
##
## [[1]][[5]]
## a
## 0.7324749
##
##
## [[2]]
## [[2]][[1]]
## [1] NA
##
## [[2]][[2]]
## b
## 0.4543944
##
## [[2]][[3]]
## a b
## 0.958574 0.372823
##
## [[2]][[4]]
## a c
## 2.19314540 0.01330892
##
## [[2]][[5]]
## a
## 1.000429
##
##
## [[3]]
## [[3]][[1]]
## [1] NA
##
## [[3]][[2]]
## b
## 0.4183132
##
## [[3]][[3]]
## a b
## 0.6714799 0.4583909
##
## [[3]][[4]]
## a c
## 1.35534413 0.03097325
##
## [[3]][[5]]
## a
## 0.8677413
##
##
## [[4]]
## [[4]][[1]]
## [1] NA
##
## [[4]][[2]]
## b
## 0.3486333
##
## [[4]][[3]]
## a b
## 0.7011912 0.3822672
##
## [[4]][[4]]
## a c
## 1.58556811 0.01362438
##
## [[4]][[5]]
## a
## 0.760291
##
##
## [[5]]
## [[5]][[1]]
## [1] NA
##
## [[5]][[2]]
## b
## 0.3442046
##
## [[5]][[3]]
## a b
## 0.7689222 0.3513017
##
## [[5]][[4]]
## a c
## 1.68157789 0.01206634
##
## [[5]][[5]]
## a
## 0.7528104
##
##
## [[6]]
## [[6]][[1]]
## [1] NA
##
## [[6]][[2]]
## b
## 0.5890211
##
## [[6]][[3]]
## a b
## 0.6069758 0.6158401
##
## [[6]][[4]]
## a c
## 2.39890712 0.02079203
##
## [[6]][[5]]
## a
## 1.375951
##
##
## [[7]]
## [[7]][[1]]
## [1] NA
##
## [[7]][[2]]
## b
## 0.3729202
##
## [[7]][[3]]
## a b
## 0.6339700 0.4664684
##
## [[7]][[4]]
## a c
## 1.050008 0.049412
##
## [[7]][[5]]
## a
## 0.831403
##
##
## [[8]]
## [[8]][[1]]
## [1] NA
##
## [[8]][[2]]
## b
## 0.3394035
##
## [[8]][[3]]
## a b
## 0.6838920 0.3843947
##
## [[8]][[4]]
## a c
## 1.50248587 0.01402991
##
## [[8]][[5]]
## a
## 0.7425137
##
##
## [[9]]
## [[9]][[1]]
## [1] NA
##
## [[9]][[2]]
## b
## 0.367975
##
## [[9]][[3]]
## a b
## 0.6258393 0.4373232
##
## [[9]][[4]]
## a c
## 1.54001720 0.01617884
##
## [[9]][[5]]
## a
## 0.7866161
##
##
## [[10]]
## [[10]][[1]]
## [1] NA
##
## [[10]][[2]]
## b
## 0.4064011
##
## [[10]][[3]]
## a b
## 0.6237355 0.4739868
##
## [[10]][[4]]
## a c
## 1.33141691 0.02715725
##
## [[10]][[5]]
## a
## 0.8508688
| 0 | 1 | 2 | 3 | 4 | |
|---|---|---|---|---|---|
| Arabic | 30174.18 | 8221.94 | 8212.02 | 8999.49 | 8534.96 |
| English | 121969.37 | 39802.73 | 39270.11 | 41166.91 | 39323.03 |
| Basque | 14266.83 | 3608.46 | 3579.44 | 4107.36 | 3738.27 |
| Greek | 19991.77 | 5088.28 | 5069.89 | 5546.06 | 5201.95 |
| Catalan | 104292.47 | 23360.12 | 23357.73 | 24848.14 | 23593.28 |
| Hungarian | 38251.06 | 19381.66 | 19365.28 | 20604.62 | 21120.99 |
| Chinese | 180870.39 | 40216.32 | 37750.69 | 44273.01 | 44946.28 |
| Italian | 26583.88 | 6488.02 | 6427.65 | 7243.78 | 6677.05 |
| Czech | 150242.10 | 49508.13 | 49031.75 | 51945.91 | 50653.57 |
| Turkish | 30859.21 | 9929.75 | 9752.06 | 10864.78 | 10115.55 |
best_model <- lapply(model_AICs, function(x) which.min(as.vector(x)))
best_coefs <- lapply(1:length(language_list), function(i) best_params[[i]][[ as.numeric(best_model[i]) ]])